Return to Homepage

Return to Data Visualisation Section


Description of the Data


The data for this post has been scraped from Anibis, a Swiss online retail platform for various objects. It gives users the possibility to create listings and sell their used objects.

In this case, there were about 24,000 real estate listing on the website. Using the rvest and httr libraries in R, I scraped and cleaned all of them. The webscraping code is not included in this post in order to keep it brief. There will be another post about modelling motorcycle and car prices in the machine learning section of my blog page.

Available variables in the cleaned data set are:

glimpse(dt)
## Rows: 23,725
## Columns: 11
## $ price_chf       <dbl> 420000, 580000, 1320000, 315000, 185000, 365000, 94500…
## $ description     <chr> "APPARTEMENT 67 m2, avec balcon de 10m2", "EXCLUSIVITÉ…
## $ canton          <chr> "Wallis", "Wallis", "Freiburg", "St. Gallen", "Bern", …
## $ long_title      <chr> "Appartement a vendre", "Les Agettes Chalet de charme …
## $ object          <chr> "Wohnung", "Haus", "Haus", "Wohnung", "Wohnung", "Wohn…
## $ category        <chr> "Wohnung", "Chalet", "Villa", "Wohnung", "Wohnung", "W…
## $ rooms           <dbl> 3.5, 4.5, 7.0, 3.0, 2.5, 3.5, 6.0, 3.5, 5.5, NA, 2.0, …
## $ space_sqm       <dbl> NA, 100, 220, 70, 50, 113, 165, 89, 145, NA, 45, 123, …
## $ characteristics <chr> "Balkon / Terrasse / Sitzplatz Garage / Parkplatz", "B…
## $ last_change     <date> 2022-09-09, 2022-09-09, 2022-09-09, 2022-09-09, 2022-…
## $ listing_number  <dbl> 44854896, 44700548, 43908903, 44864848, 44864880, 4486…

In the following, I’ll ask and answer questions that I was interested in while inspecting the data set.


What are the categories in the data?


dt %>% 
  select(where(is.character)) %>% 
  select(-c(description, long_title, characteristics)) %>% 
  pivot_longer(everything()) %>% 
  drop_na() %>% 
  group_by(name) %>%  
  mutate(value = fct_lump(value, n = 15)) %>% 
  count(value) %>% 
  mutate(value = reorder_within(value, n, name)) %>% 
  ggplot(aes(n, value)) +
  geom_col(fill = "midnightblue", alpha = 0.8)  +
  facet_wrap(~ name, scales = "free", ncol = 4) +
  labs(title = "Frequency Of Categorical House Properties on anibis.ch",
       subtitle = "Sample Size = 23725 | Data as of 09/22",
       y = NULL,
       x = "Count") +
  scale_x_continuous(labels = scales::comma_format()) +
  scale_y_reordered() +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = 14),
        plot.subtitle = element_text(face = "italic", size = 10, 
                                     colour = "grey50"),
        panel.grid.major.y = element_blank(),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

From the looks of it, touristy cantons are overrepresented in the sample. This could imply higher usage of Anibis in these regions, or more likely: A bias towards holiday homes on the platforms. The latter is not great for drawing inferences regarding the Swiss housing market, but keeping it in mind, I can still use the data.


How is price per square meter distributed across Switzerland?


dt %>% 
  select(canton, object, space_sqm, price_chf) %>% 
  filter(object %in% c("Haus", "Wohnung")) %>% 
  drop_na() %>% 
  group_by(canton, object) %>% 
  summarise(median_price = median(price_chf/space_sqm),
            lower = quantile(price_chf/space_sqm, 0.25),
            higher = quantile(price_chf/space_sqm, 0.75),
            n = n()) %>%
  ungroup() %>% 
  mutate(canton = reorder_within(paste0(canton, " (n=", n,")"),
                                 median_price,
                                 within = object), 
         object = ifelse(object == "Haus", "House", "Appartment")) %>% 
  ggplot() +
  geom_point(aes(y = canton, x = median_price)) +
  geom_errorbar(aes(y = canton, xmin = lower, xmax = higher),
                alpha = 0.4) +
  facet_wrap(~ object, scales = "free_y") +
  labs(title = "Price per square metre for Swiss real estate \nlistings by canton on anibis.ch",
       subtitle = "Sample Size = 19,466 | Data as of 09/22\nError bars show 25th and 75th percentiles",
       y = NULL,
       x = NULL) +
  scale_y_reordered() +
  scale_x_continuous(labels = comma_format(suffix = " CHF")) +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = 14),
        plot.subtitle = element_text(face = "italic", size = 10, 
                                     colour = "grey50"),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
        panel.grid.minor.x = element_blank())

Only looking at houses and appartments, a clear disparity between cantons shows. Financial centres, such as Zug, Zurich and Geneva are most expensive. Additionally, cantons with lower taxes, such as Zug, Schwyz and Nidwalden have high prices per square metre.

On the other end, cantons with a weaker private sector, such as Jura, and remote regions, such as Glarus, are less pricey.


How is living space distributed across Switzerland?


dt %>% 
  select(canton, object, space_sqm) %>% 
  filter(object %in% c("Haus", "Wohnung")) %>% 
  drop_na() %>% 
  group_by(canton, object) %>% 
  summarise(median_space = median(space_sqm),
            lower = quantile(space_sqm, 0.25),
            higher = quantile(space_sqm, 0.75),
            n = n()) %>% 
  ungroup() %>% 
  mutate(canton = reorder_within(paste0(canton, " (n=", n,")"),
                                 median_space,
                                 within = object), 
         object = ifelse(object == "Haus", "House", "Appartment")) %>% 
  ggplot() +
  geom_point(aes(y = canton, x = median_space)) +
  geom_errorbar(aes(y = canton, xmin = lower, xmax = higher),
                alpha = 0.4) +
  facet_wrap(~ object, scales = "free") +
  labs(title = "Living space for Swiss real estate \nlistings by canton on anibis.ch",
       subtitle = "Sample Size = 19,466 | Data as of 09/22\nError bars show 25th and 75th percentiles",
       y = NULL,
       x = NULL) +
  scale_y_reordered() +
  scale_x_continuous(labels = comma_format(suffix = " sqm")) +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = 14),
        plot.subtitle = element_text(face = "italic", size = 10, 
                                     colour = "grey50"),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
        panel.grid.minor.x = element_blank())

Even though there is no clearly discernible trend, some more rural places show larger living spaces.


How are houseprices distributed across Switzerland?


dt %>% 
  select(canton, object, price_chf) %>% 
  filter(object %in% c("Haus", "Wohnung")) %>% 
  drop_na() %>% 
  group_by(canton, object) %>% 
  summarise(median_price = median(price_chf),
            lower = quantile(price_chf, 0.25),
            higher = quantile(price_chf, 0.75),
            n = n()) %>%
  ungroup() %>% 
  mutate(canton = reorder_within(paste0(canton, " (n=", n,")"),
                                 median_price,
                                 within = object), 
         object = ifelse(object == "Haus", "House", "Appartment")) %>% 
  ggplot() +
  geom_point(aes(y = canton, x = median_price)) +
  geom_errorbar(aes(y = canton, xmin = lower, xmax = higher),
                alpha = 0.4) +
  facet_wrap(~ object, scales = "free") +
  labs(title = "Prices for Swiss real estate \nlistings by canton on anibis.ch",
       subtitle = "Sample Size = 19,466 | Data as of 09/22\nError bars show 25th and 75th percentiles",
       y = NULL,
       x = NULL) +
  scale_y_reordered() +
  scale_x_continuous(labels = comma_format(suffix = " CHF")) +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = 14),
        plot.subtitle = element_text(face = "italic", size = 10, 
                                     colour = "grey50"),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
        panel.grid.minor.x = element_blank())

Similar to price per square metre, the financial centres and places with very strong industry and low taxes make it to the top of the list - unsurprisingly.

 

A work by Mathias Steilen